C:/Documents and Settings/Michel/Mes documents/projects/clustering/archive/fuzzieee/grira05semi-supervised.dvi
نویسندگان
چکیده
Traditional clustering algorithms usually rely on a pre-defined similarity measure between unlabelled data to attempt to identify natural classes of items. When compared to what a human expert would provide on the same data, the results obtained may be disappointing if the similarity measure employed by the system is too different from the one a human would use. To obtain clusters fitting user expectations better, we can exploit, in addition to the unlabelled data, some limited form of supervision, such as constraints specifying whether two data items belong to a same cluster or not. The resulting approach is called semi-supervised clustering. In this paper, we put forward a new semi-supervised clustering algorithm, Pairwise-Constrained Competitive Agglomeration: clustering is performed by minimizing a competitive agglomeration cost function with a fuzzy term corresponding to the violation of constraints. We present comparisons performed on a simple benchmark and on an image database.
منابع مشابه
Apport automatisé de sémantique lors de manipulations de documents géographiques
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کاملComposition of fish communities in macrotidal salt marshes of the Mont Saint-Michel bay (France)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملRecherche d'information dans les documents numériques : vers une variation des modalités d'exécution procédurale
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005